Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2023 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-37905033

RESUMO

The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method which combines pre-trained embeddings of local structural environments with traditional statistical techniques to identify enriched functions with residue-level explainability. For the task of predicting the catalytic function of enzymes, PARSE achieves comparable or superior global performance to state-of-the-art machine learning methods (F1 score > 85%) while simultaneously annotating the specific residues involved in each function with much greater precision. Since it does not require supervised training, our method can make one-shot predictions for very rare functions and is not limited to a particular type of functional label (e.g. Enzyme Commission numbers or Gene Ontology codes). Finally, we leverage the AlphaFold Structure Database to perform functional annotation at a proteome scale. By applying PARSE to the dark proteome-predicted structures which cannot be classified into known structural families-we predict several novel bacterial metalloproteases. Each of these proteins shares a strongly conserved catalytic site despite highly divergent sequences and global folds, illustrating the value of local structure representations for new function discovery.

2.
Nat Methods ; 20(9): 1269-1270, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37580560
3.
Nat Methods ; 20(2): 165-167, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36627451
4.
Protein Sci ; 32(2): e4541, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36519247

RESUMO

The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site-specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self-supervision signal, enabling learned embeddings to implicitly capture structure-function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state-of-the-art performance on standardized benchmarks (protein-protein interactions and mutation stability) and on the prediction of functional sites from the Prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general-purpose platform for computational protein analysis.


Assuntos
Proteínas , Software , Proteínas/química , Conformação Proteica
5.
Nat Commun ; 13(1): 746, 2022 02 08.
Artigo em Inglês | MEDLINE | ID: mdl-35136054

RESUMO

The task of protein sequence design is central to nearly all rational protein engineering problems, and enormous effort has gone into the development of energy functions to guide design. Here, we investigate the capability of a deep neural network model to automate design of sequences onto protein backbones, having learned directly from crystal structure data and without any human-specified priors. The model generalizes to native topologies not seen during training, producing experimentally stable designs. We evaluate the generalizability of our method to a de novo TIM-barrel scaffold. The model produces novel sequences, and high-resolution crystal structures of two designs show excellent agreement with in silico models. Our findings demonstrate the tractability of an entirely learned method for protein sequence design.


Assuntos
Aprendizado Profundo , Engenharia de Proteínas/métodos , Sequência de Aminoácidos/genética , Simulação por Computador , Cristalografia por Raios X , Modelos Moleculares , Domínios Proteicos/genética , Dobramento de Proteína
6.
Pac Symp Biocomput ; 27: 10-21, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34890132

RESUMO

The three-dimensional structures of proteins are crucial for understanding their molecular mechanisms and interactions. Machine learning algorithms that are able to learn accurate representations of protein structures are therefore poised to play a key role in protein engineering and drug development. The accuracy of such models in deployment is directly influenced by training data quality. The use of different experimental methods for protein structure determination may introduce bias into the training data. In this work, we evaluate the magnitude of this effect across three distinct tasks: estimation of model accuracy, protein sequence design, and catalytic residue prediction. Most protein structures are derived from X-ray crystallography, nuclear magnetic resonance (NMR), or cryo-electron microscopy (cryo-EM); we trained each model on datasets consisting of either all three structure types or of only X-ray data. We Find that across these tasks, models consistently perform worse on test sets derived from NMR and cryo-EM than they do on test sets of structures derived from X-ray crystallography, but that the difference can be mitigated when NMR and cryo-EM structures are included in the training set. Importantly, we show that including all three types of structures in the training set does not degrade test performance on X-ray structures, and in some cases even increases it. Finally, we examine the relationship between model performance and the biophysical properties of each method, and recommend that the biochemistry of the task of interest should be considered when composing training sets.


Assuntos
Biologia Computacional , Proteínas , Algoritmos , Microscopia Crioeletrônica , Cristalografia por Raios X , Humanos , Conformação Proteica
7.
ChemRxiv ; 2020 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-32511288

RESUMO

The most rapid path to discovering treatment options for the novel coronavirus SARS-CoV-2 is to find existing medications that are active against the virus. We have focused on identifying repurposing candidates for the transmembrane serine protease family member II (TMPRSS2), which is critical for entry of coronaviruses into cells. Using known 3D structures of close homologs, we created seven homology models. We also identified a set of serine protease inhibitor drugs, generated several conformations of each, and docked them into our models. We used three known chemical (non-drug) inhibitors and one validated inhibitor of TMPRSS2 in MERS as benchmark compounds and found six compounds with predicted high binding affinity in the range of the known inhibitors. We also showed that a previously published weak inhibitor, Camostat, had a significantly lower binding score than our six compounds. All six compounds are anticoagulants with significant and potentially dangerous clinical effects and side effects. Nonetheless, if these compounds significantly inhibit SARS-CoV-2 infection, they could represent a potentially useful clinical tool.

8.
Pac Symp Biocomput ; 25: 463-474, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31797619

RESUMO

Millions of Americans are affected by rare diseases, many of which have poor survival rates. However, the small market size of individual rare diseases, combined with the time and capital requirements of pharmaceutical R&D, have hindered the development of new drugs for these cases. A promising alternative is drug repurposing, whereby existing FDA-approved drugs might be used to treat diseases different from their original indications. In order to generate drug repurposing hypotheses in a systematic and comprehensive fashion, it is essential to integrate information from across the literature of pharmacology, genetics, and pathology. To this end, we leverage a newly developed knowledge graph, the Global Network of Biomedical Relationships (GNBR). GNBR is a large, heterogeneous knowledge graph comprising drug, disease, and gene (or protein) entities linked by a small set of semantic themes derived from the abstracts of biomedical literature. We apply a knowledge graph embedding method that explicitly models the uncertainty associated with literature-derived relationships and uses link prediction to generate drug repurposing hypotheses. This approach achieves high performance on a gold-standard test set of known drug indications (AUROC = 0.89) and is capable of generating novel repurposing hypotheses, which we independently validate using external literature sources and protein interaction networks. Finally, we demonstrate the ability of our model to produce explanations of its predictions.


Assuntos
Reposicionamento de Medicamentos , Reconhecimento Automatizado de Padrão , Biologia Computacional , Humanos , Bases de Conhecimento , Doenças Raras/tratamento farmacológico
9.
Sci Adv ; 3(3): e1600955, 2017 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-28435858

RESUMO

Studies of neural pathways that contribute to loss and recovery of function following paralyzing spinal cord injury require devices for modulating and recording electrophysiological activity in specific neurons. These devices must be sufficiently flexible to match the low elastic modulus of neural tissue and to withstand repeated strains experienced by the spinal cord during normal movement. We report flexible, stretchable probes consisting of thermally drawn polymer fibers coated with micrometer-thick conductive meshes of silver nanowires. These hybrid probes maintain low optical transmission losses in the visible range and impedance suitable for extracellular recording under strains exceeding those occurring in mammalian spinal cords. Evaluation in freely moving mice confirms the ability of these probes to record endogenous electrophysiological activity in the spinal cord. Simultaneous stimulation and recording is demonstrated in transgenic mice expressing channelrhodopsin 2, where optical excitation evokes electromyographic activity and hindlimb movement correlated to local field potentials measured in the spinal cord.


Assuntos
Materiais Revestidos Biocompatíveis , Eletrodos Implantados , Nanofios , Medula Espinal/fisiologia , Animais , Estimulação Elétrica , Masculino , Camundongos , Camundongos Transgênicos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...